Feat/gemma4 adapters by huseyincavusbi · Pull Request #1385 · TransformerLensOrg/TransformerLens

huseyincavusbi · 2026-06-13T12:07:24Z

Description

This PR adds TransformerBridge support for the Gemma 4 model family (E2B, E4B, 26B-A4B, and 31B) through a single unified Gemma4ArchitectureAdapter.

Key Implementation Details

Unified Adapter (gemma4.py): Dynamically handles all 4 variants by evaluating initialization configuration flags:
- MoE Blocks: Submodules conditionally spin up only when enable_moe_block=True (specifically for the 26B variant).
- KV-Sharing: Dropped gracefully when num_kv_shared_layers > 0 (for E2B/E4B).
- PLE Embeddings: Surfaced dynamically when hidden_size_per_layer_input > 0.
- Weight Processing: Maps and converts Gemma 4's joint QKV layout, dual RoPE configurations, alternating sliding/full attention mechanisms, logit softcapping, and RMSNorm.
- Includes 45 dedicated unit tests verifying config attributes, MoE behavior, and weight conversions.
Shared-Library Updates (3 files, fully opt-in, zero regressions on existing adapter tests):
1. position_embeddings_attention.py: Applies V norm post-reshape (Gemma 4 is the first architecture featuring per-head value normalization). Handles KV-sharing delegation to Hugging Face's original attention implementation when K/V submodules are omitted. Caches computed KV states in shared_kv_states post-RoPE for structural layer reuse.
2. bridge.py: Introduces a use_native_generate opt-in flag. This bypasses a current Hugging Face transformers dev-version issue where eager attention causes a KV-cache dimension mismatch during generation. Setting this flag (scoped strictly to this adapter) delegates processing to HF's native generate() utilizing SDPA.
3. main_benchmark.py: Fixes pad_token_id assignment when eos_token_id is a list (Gemma4 uses [1, 106]), taking the first element.

Verification & Performance

All models have been validated.

Fixes #1297

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

…nfig)

…utes - Unwrap text_config for Gemma4ForConditionalGeneration models - Read PLE, KV sharing, layer_types, softcapping from text_cfg - Add NotImplementedError guard for MoE variants (26B-A4B) - Update tests to exercise text_config path

…le weight)

…etection

jlarson4

Hey @huseyincavusbi glad to finally see this come through. I have a couple comments that exist below, take a look when you have a moment and let me know what you think.

Additionally, @punishell has recently opened #1377, which is a parallel implementation of Gemma4. I'd like to include bits of both your implementations where it makes sense & is relevant. They came up with a very straight forward solution for the KV-cache issue that might be of use to you, if you want to try rebasing your work onto theirs as an extension point. I am thinking there may be a way to use their DelegatedAttentionBlockBridge in combination with your work spent on adding support for Gemma4 to position_embeddings_atttention to provide even better overall support.

There are more moving parts here than anticipated, if you have questions please feel free to ask.

jlarson4 · 2026-06-15T15:53:40Z

+
+import pytest
+
+from transformer_lens.config.TransformerBridgeConfig import TransformerBridgeConfig


Since you began this PR, the structure of the TrasnformerBridgeConfig import path was adjusted due to a name conflict introduced in an related change refactor. Please update this to

from transformer_lens.config import TransformerBridgeConfig

jlarson4 · 2026-06-15T15:58:00Z

+        # with a specific transformers version). Set self.cfg.use_native_generate = True
+        # in the adapter's __init__.
+        if getattr(self.cfg, "use_native_generate", False):
+            return self.hf_generate(


This delegation is dropping potential kwargs that a user may pass in. stop_at_eos, prepend_bos, padding_side, freq_penalty, use_past_kv_cache, as well as the new stop_strings/stopping_criteria add in #1374 to name a few. Someone using Gemma4 who calls calling generate(..., stop_strings=".") would have it silently ignored.

If you end up opting to keep use_native_generate, we will need to make sure all relevant kwargs are properly passed thorough

jlarson4 · 2026-06-15T16:08:02Z

            return self.hf_generate(input, **hf_kwargs)

+        # Adapters can opt-in to delegating generation to HF's native generate()
+        # (e.g. when the bridge's custom attention has a KV-cache incompatibility


Does no-cache hooked generation (use_past_kv_cache=False) work for Gemma4 with this incompatibility? If so, that's a better stopgap than delegating to hf_generate. It preserves hooks and lets you drop the use_native_generate flag. If you could dig into that and let me know what you find, I'd appreciate it.

huseyincavusbi added 24 commits June 13, 2026 14:30

feat: Initial Gemma4 architecture adapter (V norm, softcap, PLE/KV co…

c19a062

…nfig)

feat: Register Gemma4ArchitectureAdapter in factory and __init__

5d5564d

feat: Add final_rms and eps_attr to Gemma4 adapter config

b1c2a3d

fix: Use setattr for custom config fields to pass mypy

39565bc

fix: Register Gemma4ForConditionalGeneration alias

eaf190c

fix: Dynamic text prefix for text-only vs multimodal Gemma4 variants

cadfe52

fix: Add Gemma4 to model_registry and add unit tests

79d8de4

Remove dead v_norm weight conversion (with_scale=False has no learnab…

eb9f214

…le weight)

Add full Gemma4 MoE support with optional submodules for 26B-A4B

7fb469e

Make k_proj, v_proj, k_norm, v_norm optional for KV-sharing layers

decefd8

fix: AutoModel returns Gemma4Model directly, correct text_prefix

2700965

fix: revert text_prefix — AutoModelForCausalLM needs model. prefix

74b6168

fix: check cfg.architecture instead of cfg.architectures for prefix d…

7033b91

…etection

fix: delegate to original attention on KV-sharing layers

b54e2cd

fix: store computed KV in shared_kv_states for Gemma4 KV-sharing

d5ce541

fix: add Gemma4ForConditionalGeneration to MULTIMODAL_ARCHITECTURES

ee60b1c

fix: add use_native_generate opt-in flag for hf_generate delegation

6587691

feat: use_native_generate and prepare_loading for Gemma4 adapter

3237784

fix: handle list eos_token_id when setting pad_token_id

6a21267

fix: apply V norm in post-reshape attention phase for Gemma4

0732f27

fix: restore Gemma3nForConditionalGeneration in MULTIMODAL_ARCHITECTURES

a8d2c4e

fix: remove dead eps_attr, resolve conflict marker, fix mypy

a30f390

feat: add multimodal vision support to Gemma4 adapter

7eed605

huseyincavusbi marked this pull request as draft June 14, 2026 10:49

jlarson4 changed the base branch from main to dev June 15, 2026 15:47

jlarson4 reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/gemma4 adapters#1385

Feat/gemma4 adapters#1385
huseyincavusbi wants to merge 24 commits into
TransformerLensOrg:devfrom
huseyincavusbi:feat/gemma4-adapters

huseyincavusbi commented Jun 13, 2026

Uh oh!

jlarson4 left a comment

Uh oh!

jlarson4 Jun 15, 2026

Uh oh!

jlarson4 Jun 15, 2026

Uh oh!

jlarson4 Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		import pytest

		from transformer_lens.config.TransformerBridgeConfig import TransformerBridgeConfig

Conversation

huseyincavusbi commented Jun 13, 2026

Description

Key Implementation Details

Verification & Performance

Type of change

Screenshots

Checklist:

Uh oh!

jlarson4 left a comment

Choose a reason for hiding this comment

Uh oh!

jlarson4 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

jlarson4 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

jlarson4 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants